Skip to content

Conversation

DerGuteMoritz
Copy link
Collaborator

@DerGuteMoritz DerGuteMoritz commented Sep 19, 2025

Also, make test-alt a bit more reliable and make the exception messages distinct so that a dropped error can easily be matched up with the deferred.

Note this only fixes cases which are due to the test code itself. All other failing tests require changes in Manifold itself.

Based on #245.

Also, make test-alt a bit more reliable and make the exception messages distinct so that a dropped
error can easily be matched up with the deferred.
Copy link
Collaborator

@KingMob KingMob left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems ok

(deftest test-alt
(is (#{1 2 3} @(d/alt 1 2 3)))
(is (= 2 @(d/alt (d/future (Thread/sleep 10) 1) 2)))
(is (= 2 @(d/alt (d/future (Thread/sleep 100) 1) 2)))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is the 10 making this test flaky?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm. Probably because there's a non-zero chance of the following sequence:

  1. deftest thread starts the future thread.
  2. deftest thread gets swapped out
  3. future thread sleeps, runs, and returns
  4. deftest thread resumes, and starts the alt logic checking its possible values in random order
  5. deftest thread checks the future results before "2", finds it's finished, and returns

It's unlikely, but not impossible. The longer the future thread sleeps, the less likely the alt logic is to be delayed.


Honestly, this test is not great. Using unit tests on probabilistic behaviors was always a recipe for flakiness.

One quick way to improve consistency is to set a seed during testing, so the random order used by alt/alt' is always the same.

It won't fix the underlying issue, though. As the sleep time approaches the slice time given threads, the odds of the "sleeping" thread finishing before alt checks any of its values, goes way up.

One possibility is to run it repeatedly, and set a threshold for the expected proportion that select 1 vs 2. Of course, that proportion will still be affected by hardware, so...

Copy link
Collaborator Author

@DerGuteMoritz DerGuteMoritz Oct 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, @KingMob's hypothesis is also what I arrived at after having seen it fail like that occasionally. This happened while I was working on the patch. I repeatedly ran the test from the REPL so there was probably a bit more thread scheduling going on compared to running the test suite once with lein test.

I agree that this test isn't great. However, I just realized that we can actually make it fully deterministic - see ec01b4a!

Involving `d/future` makes these tests needlessly flaky. By decoupling the `d/alt` invocation from
dereferencing its result, we can deterministically control the order of events.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants